fix(rag): don't let a broken native dep crash every agent import#1534
Conversation
A broken native dependency under the optional RAG stack — most commonly torchcodec/FFmpeg pulled in transitively by sentence-transformers, but also an arch-mismatched faiss build — raises RuntimeError/OSError at import, not ImportError. The guard in rag/sdk.py only caught ImportError, so the exception escaped and killed the entire import chain: importing gaia.agents.chat (or any agent that transitively imports RAG) died at module load even when RAG was never used, taking `gaia chat` and friends down with it. Broaden the sentence-transformers and faiss guards to treat any import failure as "not installed", and capture the failure reason so the loud, deferred error in RAGSDK._check_dependencies() distinguishes "not installed" (reinstall) from "installed but broken" (fix the underlying native dep, e.g. FFmpeg) instead of misdirecting the user.
SummarySolid, well-scoped fix for a real problem: a broken native dep under One blocking issue: the secondary goal — telling "not installed" apart from "installed but broken" — doesn't actually work for the common case. Because the guard captures Issues Found🟡 Important — genuinely-missing dep is mislabeled "installed but failed to load" (
|
…lint The module-level error-capture assignments tripped pylint C0413 (wrong-import-position). Recover the failure reason inside _check_dependencies() via importlib instead, keeping the module's import block clean. Behaviour is unchanged.
|
🟡
Fix: patch (Same pattern for the |
…overable importlib.import_module bypasses builtins.__import__, so the recovery path that names an installed-but-broken dependency could never be exercised by the dependency-guard tests (they intercept imports), and in a deps-absent CI environment the broken-cause section was never produced — failing test_broken_install_reports_actionable_cause. Use the __import__ builtin to re-run the real import, which both surfaces the native load error in production and is interceptable in tests.
Why this matters
Before: if the optional RAG stack was installed but a native library underneath it couldn't load — most often torchcodec/FFmpeg pulled in transitively by
sentence-transformers, or an arch-mismatchedfaiss—gaia chatand every other agent died at import time with a rawRuntimeError, even though RAG was never used. The crash had nothing to do with what the user was running; one broken wheel took the whole CLI down.After: a broken optional dependency is treated the same as a missing one — the import succeeds, agents load, and if (and only if) RAG is actually used,
RAGSDKraises a loud, actionable error that names the real cause ("installed but failed to load … e.g. a missing FFmpeg for torchcodec") instead of telling the user to reinstall a package they already have.Root cause: the guard caught only
ImportError, but native-load failures raiseRuntimeError/OSError.Test plan
python -m pytest tests/unit/rag/test_dependency_guard.py -v— 3 passimport gaia.agents.docqa.agentno longer raises in an environment with broken torchcodec (previously crashed at module load)RAGSDKwith a broken dep raisesImportErrornaming the captured cause; a genuinely missing dep gets install instructions onlyblack/isortclean on changed files